Pitch Period Estimation by Filtering the Fundamental Frequency out of the Speech Waveform

نویسنده

  • Dick R. van Bergem
چکیده

In the past decades a lot of algorithms for estimating the fundamental frequency of speech signals have been proposed. For an overview of the most important ones see Hess (1983). The algorithms either work in the time domain or in the spectral domain. Those working in the spectral domain use a short-term analysis of the speech signal. That is, the average pitch period duration is estimated for (usually overlapping) frames that contain a small number of successive pitch periods. Those working in the time domain estimate a train of laryngeal pulses, which are called pitch markers. Time domain methods have some clear advantages over spectral methods if they work properly. In the first place, they are exact in locating pitch periods, whereas spectral methods can only give estimates of pitch contours and may have difficulty with frames containing both voiced and unvoiced speech. In the second place, a pitch contour derived from a series of pitch markers can be very easily checked by a careful examination of the markers. It can also easily be hand-corrected by removing or adding a number of markers. In the third place, time domain methods can (contrary to spectral methods) be used for pitch synchronous techniques such as pitch synchronous Fourier transformations or the PSOLA-technique with which the prosodic features of speech can be manipulated (Hamon et al., 1989; Charpentier and Moulines, 1989). Recently an interesting time domain pitch extraction method was proposed by Dologlou and Carayannis (1989). They filtered the fundamental frequency out of the (digitized) speech waveform with the iterative use of a lowpass filter (a 3-point Hanning window). Comparing the filtered fundamental frequency with the output signal of a laryngograph, they observed that the points of closure of the vocal chords were very well estimated by the drops in the fundamental frequency signal that were extracted by their method. The crucial part of their algorithm concerns the criterion with which the iterative filtering is terminated. This is done by checking at the end of each iteration whether one or more frequency components are still present in the filtered signal. For this purpose Dologlou and Carayannis use the autocorrelation function and a 2-order LPC analysis which give the same estimate of the frequency of a pure sinusoid; when the difference between these two estimates falls below a predetermined threshold the algorithm stops. However, the assumption that the fundamental frequency is a pure sinusoid is of course not correct, because it is continuously changing. Even within a small stretch of speech (Dologlou and Carayannis used non-overlapping frames of 100 ms) the fundamental frequency varies, so that the threshold they mention should be rather conservative. On the other hand, this might lead to a preliminary ending of the iterations. The problem of choosing a proper threshold is illustrated by the fact that the authors fail to mention which threshold they propose.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vocoded Speech in the Absence of the Laryngeal Frequency

Most pitch excited channel vocoders require the fundamental or laryngeal frequency of the input speech to be present if the output speech is to be of high quality. In order to determine if speech whose fundamental is absent can have its pitch accurately restored so as to be used as an input to a vocoder, a computer simulation was performed. The fundamental was restored by passing the speech thr...

متن کامل

Musical Acoustics and Speech Communication: Musical Pitch Tracking and Sound Source Separation Leading to Automatic Music Transcription I

A powerful pitch estimation algorithm called SWIPE has been developed for processing speech and music. SWIPE is shown to outperform existing algorithms on several publicly available speech and musical instrument databases, and a disordered speech database, reducing the gross error rate by 40%, relative to the best competing algorithm. In short, SWIPE estimates the pitch as the fundamental frequ...

متن کامل

Pitch analysis methods for cross-speaker comparison

A system of fundamental frequency analysis and normalisation is described for obtaining pitch data and comparing them across speakers. This system was used for the analysis of English and Spanish speakers' productions in order to compare the realization of accentual focus in the two languages. The system is based on the simultaneous recording of speech and the laryngeal signal. The latter is mo...

متن کامل

New Time-frequency Domain Pitch Estimation Methods for Speech Signals under Low Levels of Snr

New Time-Frequency Domain Pitch Estimation Methods for Speech Signals under Low Levels of SNR Celia Shahnaz, Ph.D. Concordia University, 2009 Pitch estimation of speech signals is the key to understanding most acoustical phenomena as well as accurately designing many practical systems in speech communication. It is to determine the fundamental frequency or period of a vocal cord vibration causi...

متن کامل

Speech analysis by subspace methods of spectral line estimation

Over frames of short time duration, filtered speech may be described as a finite linear combination of sinusoidal components. In the case of a frame of voiced speech the frequencies are considered to be harmonics of a fundamental frequency. It can be assumed further that the speech samples are observed in additive white noise of zero mean, resulting in a standard signal-plus-noise model. This m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016